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1. INTRODUCTION 

Changes in scene and motion detection are two basic steps that play an essential and guiding role in 
simple and complex environments, where most outdoor surveillance videos are recorded [1]—[6]. However, 
the variation of the static background in some unfamiliar designs still make the mission of correctly 
extracting the foreground from the background a widely occurring challenge in surveillance video analysis 
[7]-[11]. Background subtraction is a necessary task in video applications such as surveillance to track, 
index, retrieve, and capture the essential metadata of people, cars, and other different moving objects either in 
real-time or off-time [12]-[16]. It is a starting point for higher processing tasks of video systems such as 
object identification, detection, and tracking over video sequences in recent applications and researches 
[17]-[21]. It is stated that many background models require the adjustment of initial frames to upgrade the 
background without any movement over time. Yet, such a hypothesis does not always remain true due to the 
dynamicity features in packed scenario backgrounds, and therefore other aspects need to be addressed. Over 
years, numerous background subtraction models have been suggested and established using different 
techniques, meanwhile their performance remains a gap, being generally based on objective and subjective 
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measures [22], [23]. In [24] the MOG2 model is proposed by using Gaussian mixture likelihood density and 
recursive equations to upgrade the factors and select the suitable amount of pixel components of objects. 
Moreover, the K-nearest neighbours (KNN) was proposed [25] to analyze the pixel-level background 
subtraction and offer a modest non-parametric adaptive density estimation method. Method in [26] made use 
of the Kalman filters and Gale-Shapley matching granulated metrial gland (GMG) in association with 
estimating the background, Bayes decision rule, and solution estimation to many objects tracking over active 
statistics. In [27], the adaptive local singular value decomposition (SVD) binary form as local similarity 
binary pattern (LSBP) feature is proposed. Its work is based on small areas that are in a given frame and 
tends to improve the detection process when illumination changes occur, such as noise and shadows. In [28], 
a speed computation algorithm of object detection is proposed which first removes noisy pixels and then 
applies some adaptive thresholds to catch moving objects as a foreground extraction. However, based on 
[29]-[33], even they propose new techniques but the problem of separating noisy background pixels in an 
outdoor environment remains present. The proposed threshold adaptation and XOR accumulation (TAXA) 
algorithm works based on how much information is available that surrounds each pixel; this information can 
decide whether the pixel belongs to the foreground or background; the decision is made according to the 
novelty use of XOR-theory for crucial adaptive thresholds of statistic techniques as shown in section 3 with 
detail. Section 2 presents a review of the popular related works and explains the objective and subjective 
measures and the weaknesses in such prominent methods. Section 4 discusses the algorithm results as 
compared to the alternative prominent methods, and section 5 states the conclusion of the proposed work. 


2. EVALUATION MEASURES AND WEAKNESSES IN TECHNIQUES 
2.1. Objective measures 

The measurement of precision (quality) is the pixels ratio of the retrieved pixels as (1). It was used 
to assess recently the preliminary extraction techniques objectively. The precision measurement is used in 
particular to specify the appearance relationship between moving object and the small spread pixels which 
may be noisy details rather than objects in frame. It computed based on sufficient criteria namely the ground 
truth frame, true positive (TP), false positive (FP), true negative (TN), and false negative (FN) pixels. On the 
opposite, the subjective evaluation is released based on human vision observations of resulted frames. For 
such a tested frame, TP includes the pixels that truly represent a moving object's body and are eventually 
considered as foreground correctly, whereas FP involves the pixels that are in fact part of the background but 
are falsely identified as a moving object’s body, and thus considered as foreground by mistake. TN concerns 
the pixels that truly represent the background and are correctly considered as background, meanwhile FN 
involves the pixels that are actually part of the foreground but are falsely identified as background. The 
ground truth frame is the frame that has optimal specifications for measuring (separate fixed background 
objects than moving foreground objects correctly without any noise details). 


Precision = —— (1) 
TP+FP 

2.2. Weaknesses in techniques 

Many of the studies mentioned above claim that they subjectively have overcome many unexpected 
changes in indoor and outdoor lighting and noisy backgrounds (based on their findings). The performance of 
recent techniques OpenCV and other libraries is remarkably weak. See the following finding were the 
original frame No.847.jpg [33] is described in Figure 1(a), the ground truth in Figure 1(b), the MOG2 result 
in Figure l(c), the KNN result in Figure 1(d), the GMG result in Figure 1(e), and the LSBP result in 
Figure 1(f). 


Figure 1. Results of applying the foreground moving objects extraction methods using frame No.847.jpg of 
recorded video (11 FPS) on 7:00 AM, (1920x1080) frame size of [33] (a) original frame, (b) ground truth, 
(c) MOG2, (d) KNN, (e) GMG, and (f) LSBP 
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The performance of the distinctive object pixels from the background pixels based upon a subjective 
evaluation of the Figure 1 is differs. The noisy white pixels are spreading into the background (i.e. false 
positive FP). They should be removed or undiscerned as object pixels. The actual white pixels of the object 
have. However, black pixels within the object body (i.e. false negative (FN)). 


3. PROPOSED METHOD 

The present study proposes the hybrid application of statistical techniques (median, Gaussian and 
median) after initializing, updating, and using average technique for each frame. The medium and Gaussian 
are therefore used to blur every frame to produce a place to remove noise as the first stage. In the second 
stage, the first step of extraction is performed using many adaptive Gaussian, mean, Canny edge, and pixel 
intensity thresholds. Finally, to merge the results threshold frames, the novel XOR-bitwise, AND, and OR 
operations will decide on the final separation of the pixels into the foreground and the background. Figure 2 
shows the steps of the TAXA algorithm being proposed. 


l Stage One: Background | l Stage Two: Threshold Il Stage Three: Foreground | 
| Initialize, Update, and | l Adaptation 1 | Mask | 
| Nosie removal i I 11 Bitwise Operations l 
for, AND, OR 
l Running Average 1! I I l 
i l Background ; | Adaptive Thresholds i 
; -g (Gaussian and Mean) 
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a: — l Current Fram e Difference ' | [Pixel Intensity Threshold l 
hic | | L__Ginary threshold) ' 
Noisy Pixels Rem oving l 
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Figure 2. The proposed TAXA diagram 


In stage one, the first frame is considered as background using the running averaging technique to 
initialize the background and updating each next frame continuously as (2): 


Aver(r,c) = (1 — a) x Aver(r,c) + a x frame(r,c) (2) 


where, Aver(r,c) is an accumulator buffer as (R,G,and B) frame’s channels that compute the weighted 
amount of input and accumulator. The resulting turns out to be a running average of frames 
sequences. a (Alpha) is a learning factor value ranged (Oto 1) that brings up-to-date the speediness 
regulation (time spent for past frames to be unremembered by the accumulator buffer) as in Figure 3(a) for 
the original frame No. 1.jpg and its running average, Figure 3(b) for the original frame No. 56.jpg and its 
running average, Figure 3(c) for the original frame No. 90.jpg and its running average, and Figure 3(d) for 
the original frame No. 191.jpg and its running average. 

The absolute difference Diff(r, c) is calculated between both the current frame CF(r, c) and the 
running averaging Aver(r, c) buffer, which form the background as (3) while in (4), the salt-and-pepper noise 
is eliminated using the median mask medBlur (which has a better influence) by (3x3) aperture size for the 
absolute difference frame. 


Diff (r,c) = |CF(r,c) — Aver (r, c)| (3) 
Bf(r,c) = medBlur(Dif f(r, c),9) (4) 


The Gaussian noise is removed to the Bf(r, c) by Gaussian filter which makes use of (kxk) size aperture 
selected through experimental results as (5). 


Frame(r,c) = Gaussien(Bf (r, c)) (5) 
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It should be observed that median filters and Gaussian filters have a point view; the center location 
values may not derive (not originate) from the pixel values in the source frame when the median filter’s 
location values are derived from the original pixel values of the source frame. So balanced operations need to 
be applied and block sizes selected according to the experimental findings. The programming tests indicate 
that the process of removing noise in each pure channel data (R, G, and B) results in higher performance 
before it is converted to a gray channel. Then, in stage two; the colored frame is converted to grayscale 
frame Gray(r, c) as in (6). 


Gray(r,c) = R x 0.299 + G x 0.587 + B x 0.114 (6) 


The adaptive threshold of cross-correlation is calculated with a Gaussian window (kxk) for the 
surrounding area of pixel location (r, c) minus a constant C, using the default standard deviation of the 
window in (7): 


0 if Gray(r,c) >T(r,c) 


Th1 = 
a otherwise 


(7) 


where T(r, c) is the adaptive Gaussian threshold value, individually computed for each pixel as a mean of 
Gaussian (k X k) surrounding pixel area (r, c) minus the constant C. MaxValue is the highest grayscale value 
is (255). In (8), the adaptive mean threshold is calculated using a mean sum of the surrounding area of the 
pixel (r, c) minus constant C. 


0 if Gray(r,c) > T(r,c) 


Th2 = 
baa otherwise 


(8) 


Where T(r,c) is the adaptive mean threshold value which individually computed as the mean of the 
surrounding region of that pixel (r,c) minus the C constant. MaxValue is the highest grayscale value is 
(255). In (9), the pixel intensity threshold is determined, so that the pixel values that are less than 30 are 
considered as black (0); otherwise, they should be white (1). This is due to the saturation phenomena of 
human vision. 


Th3 = { 


0 if Grayimage(r,c) < 30 (9) 
1 


otherwise 


In (10), the Canny edge detector thresholds steps are calculated to ensure that all-important (real) edges that 
construct the form of a moving object will be recognized and added properly. However, this process needs a 
fine-tuning between the image properties to find the correct real edges. 


Th4 = Canny (low, high, Size = 3) (10) 


The pixel is recognized as an edge if its gradient is higher than the upper threshold; the pixel is not 
an edge if its gradient is less than the lower threshold. The pixel is considered as a real edge if its gradient 
value ranges between the upper and lower thresholds, in addition to being associated with a pixel that is over 
the upper threshold. Finally in stage three, the XOR-bitwise accumulation and basic (AND and OR) 
operations are used to decide which pixels are related to the objects and which are not. In terms of the 
theoretical contribution, the small details (noisy background pixels) should be omitted as in (11): 


Result = XOR(Th1 & Th2,Th3 + Th4) (11) 


where “&” represents an AND-bitwise, “+” is an OR-bitwise, while XOR is an Exclusive-OR (11=0, 00=0, 
10=1, 01=1). Result represents a resulted image. However, the use of the bitwise gates may need a theoretical 
view, as the operations should be chosen precisely so that the first two thresholds of Gaussian and mean 
(Thl, and Th2) are used to extract the object's body pixels actually by stern condition (&), while the 
thresholds (Th3, and Th4) are used to improve the real edges and object boundaries using tolerant condition 
(+). The use of XOR-accumulation is to gather the resulted pixels and reject both the noisy background 
pixels and the misleading results when both Th1 and Th2 is equal to 1 and Th3+Th4 is equal to 1, taking into 
consideration all possible cases of noise generation. This thought is compatible and measured with the 
programming practical execution. More explanation upon the practical side and the obtained results are 
further elaborated in the next section. 
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Original frame (1 jpg) Running average Original frame (56.jpg) Running average 
at 07:00:00 AM at 07:00:00 AM at 07:00:05 AM at 7:00:05 AM 


(a) (b) 


Original frame (90.jpg) Running average Original frame (191 jpg) Running average 
at 07:00:08 AM at 07:00:08 AM at 07:00:17AM at 07:00:17 AM 


(c) (d) 


Figure 3. Explanation of running average technique at different time of 11 FPS video from (a) original frame 
no. 1.jpg, (b) original frame no. 56.jpg, (c) original frame no. 90.jpg, and (d) original frame no. 191.jpg 


4. RESULTS AND DISCUSSION 
4.1. Images set and ground truth generation 

The proposed TAXA algorithm is compared to the most popular available methods on OpenCV 
library based on high-definition HD resolution videos (high details). HD resolution videos have recently seen 
as new challenge for surveillance systems processing. The Figures 4(a) to 4(g) show seven frames in 
different times and conditions (including moving vehicles, waving of trees, dust, sunshine and shadow) for a 
fixed camera video (1920x1080) used in application and testing. 

Based on the following steps, the ground truth is calculated for each frame. Initially, each frame is 
gray-scale converted. Secondly, object frontiers are manually detected by human observation by precisely 
recognizing objects as black blocks as in Figures 5(a) and (b). Thirdly, for each frame the canny edge 
detector is used to detect the true boundaries as in Figure 5(c). Finally, the ground truth frames are produced 
as in Figure 5(d). 


Figure 4. Referenced frames used for the evaluation of HD video at 7:00 am, frame size (1920x1080): 
(a) scene | no. 636.jpg, (b) scene 2 no. 710.jpg, (c) scene 3 no.1057.jpg, (d) scene 4 no.1129.jpg, 
(e) scene 5 no. 1225.jpg, (f) scene 6 no. 1387.jpg, and (g) scene 7 no. 1789.jpg 
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Figure 5. Ground truth generation, (a) original 710.jpg, (b) convert it to grayscale (black blobs), 
(c) Canny edge detector applying, and (d) ground truth 


4.2. Evaluation results 

The results are collected on the basis of the seven source frames presented in Figure 4 and Tables 1 
to 3. It is determined based on precision measurement, calculation times, the use of CPU, memory, and the 
subjective evaluation of all results of tables. The precision ratings are excellent (90...99), very good (80...89), 
good (70...79), and acceptable (60...69). The Core i7 device has 6 GB storage, and OpenCV version (4.1.25) 
with Python library (3.7) in PyCharm editor implements each of the proposed TAXA and the alternative 
algorithms. The average results in following Table | show that the proposed TAXA algorithm has achieved a 
promising result and outputs excellent and highest quality all other prominent algorithms. The proposed 
TAXA algorithm overrides the others, on the basis of the Table 2 and Figure 6(a) for execution time results, 
Figure 6(b) for CPU usage results, Figure 6(c) for memory usage results. The proposed algorithm exceeds 
each of the KNN, GMG and LSBP, according to the time of computations (execution) and is about the like of 
MOGz2 (designed for real time operations). As compared to the ground-truth frames of Table 1, a fair 
subjective evaluation is showed through understanding and confidences in the results as the original frame 
No.1225.jpg of in Figures 7(a), the ground truth of scene 5 in Figures 7(b), the TAXA result in Figure 7(c), 
the MOG? result in Figure 7(d), the KNN result in Figure 7(e), the GMG result in Figure 7(f), and the LSBP 
result in Figure 7(g). 


Table 1. Precision measurement results of applied algorithms 


Frames TAXA MOG2 KNN GMG LSBP Line graph of results 
636.jpg 087 0.77. 081 0.74 0 1 
amg 636. 
710.jpg 088 073 08 069 0 JPE 
1057.jpg 0.89 08 0.82 064 0.84 08 y =E 710.jpg 
1129.jpg 085 063 0.77 0.53 0.82 Š == 1057jpg 
1225.jpg 0.95 0.83 0.87 0.77 0.89 06 2 ampt 1129.jpg 
1387jpg 0.91 0.84 089 0.74 0.87 oa Ê wen 1225.92 
1789.jpg 0.95 0.76 085 07 0.86 5 
U b yA 
Average 0.9 0.76 0.83 0.68 0.61 o2 = 1387p 


et 1789.jpg 


0 
LSBP GMG KNN MOG2 TAXA 


APPLIED ALGORITHMS 


Table 2. Execution times, CPU, and memory usages applied on video length 1:57 minutes, 
(11 FPS, total of 1297 frames) at 5:37 am, 2560x1920 


Measures TAXA MOG2 KNN GMG LSBP 

Execution time/seconds 210 190 213 835 3200 
Memory usage 6.56% 11.27% 10.42% 42.76% 34.38% 
CPU usage 20.0% 81.3% 80.5% 59.0% 79.2% 


Generally, in all used algorithms, the proposed TAXA algorithm has a superior value. It is worth 
mentioning that the distributing pixels associated with the objects are more apparent than the other white 
noisy pixels that relate to the background compared to the ground truth frame. The focus should be on the 
object body only for more reliable results based on the region of interest concept, and to avoid the 
background FN by eliminating every referenced frame in Figure 4 based on ROI. Therefore, the TAXA 
algorithm produces new excellent results compared with the rest as in Table 3. 
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As a result, the proposed TAXA algorithm still has the superiority scores over the others. It is worth 
noting that our proposed TAXA algorithm has sufficient criteria for identifying the objects’ shape in order to 
extract foreground pixels. Consequently, applying XOR-accumulation for the extracted thresholds leads to 
the extraction of real foreground objects’ pixels rather than background pixels; thereby achieving best results 
over all prominent used algorithms. 


Table 3. Precision measure of applied algorithms to the references frames ROIs of Figure 4 


RIOS TAXA MOG2 KNN GMG __ LSBP Line graph of results 
636.jpg 0.83 0.77 0.81 0.74 0 
(350x350) 1 —+— 636.jpg 
710.jpg 0.86 0.73 0.79 0.69 0 09 =B 7105p 
(1550x450) 08 =á 1057,jpg 
1057.jpg 0.89 0.83 0.84 064 0.86 07 — = 11298 
(350x350) 06 Q —W—1225ipg 
1129.jpg 0.84 0.62 0.76 0.52 0.82 05 g —9—1387jpg 
(550x350) o4 S = 1789.jpg 
1225.jpg 0.97 0.83 087 0.76 0.89 03 S 
(750x450) 02 & 
1387.jpg 0.96 0.83 0.88 0.74 0.88 i 
(700x350) . 
1789 jpg 0.96 0.76 085 0.70 0.85 Q 
(700x350) LSBP GMG KNN MOG2 TAXA 
Average 0.90 0.76 0.82 0.68 0.61 APPLIED ALGORITHMS 
results 
4000 © - 100,00% - 50,00% z 
8 80,00% S + 40,00% 8 
3000 £. d ož j o 2 
ie + 60,00% G i 30,00% ‘= 
2000 = ag A 
3 40,00% * | 20,00% & 
1000 > - 20,00% = J | 10,00% & 
o | 
o £ 0,00% L 0,00% 
> a 020 < ao O20 <4 
a N 
TERE FELE R228 3 
= a ae a at ae 
Applied Algorithms Applied Algorithms Applied Algorithms 
(a) (b) (c) 


Figure 6. Line chart results of Table 2 (a) time in seconds, (b) CPU usage, and (c) memory usage 


Figure 7. Evaluation of scene (5) frame no. 1225.jpg in Table | over algorithms based on ground truth 
(a) frame no. 1225.jpg, (b) ground truth, (c) TAXA, (d) MOG2, (e) KNN, (f) GMG, and (g) LSBP 
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5. CONCLUSION 

The extraction task for foreground objects is still open and vital to any intelligent surveillance 
system based on background modelling. This paper proposes a TAXA algorithm, which detection over high 
definition videos of slow and fast dynamic varied object forms. The TAXA algorithm is new in the 
application of the XOR-accumulation hypothesis to decide whether it is an object or a history for the pixels. 
The proposed algorithm works with a hybrid application of median, Gaussian and mean statistical methods 
over average background and adaptive threshold. In the end, the execution results overcame prominent 
OpenCV algorithms based on objective measures as well as better subjective assessment. The TAXA 
algorithm can be used well for object detection surveillance systems. The future work will be to add its code 
to OpenCV and apply it to real-time applications. 
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